Mirror of linear changes#2
Open
kwyss-nvidia wants to merge 56 commits intokwyss/cublas_gemm_github_mrfrom
Open
Mirror of linear changes#2kwyss-nvidia wants to merge 56 commits intokwyss/cublas_gemm_github_mrfrom
kwyss-nvidia wants to merge 56 commits intokwyss/cublas_gemm_github_mrfrom
Conversation
13 tasks
07d55ea to
8bb7d63
Compare
c3eebe7 to
b848509
Compare
8bb7d63 to
365a4d9
Compare
b848509 to
1058efc
Compare
6c70366 to
08aa4de
Compare
eee37bf to
ce4ca80
Compare
51fbe41 to
78c194d
Compare
ce4ca80 to
5ebc93a
Compare
78c194d to
8f4f0f0
Compare
1d112ac to
48648a9
Compare
5aa279e to
8466c36
Compare
ca005ab to
e35f2b6
Compare
8466c36 to
e788ca2
Compare
22828fe to
413331d
Compare
e788ca2 to
9ac89ea
Compare
db5b49e to
8d59b0a
Compare
9ac89ea to
fa019d5
Compare
8d59b0a to
3424dc7
Compare
fa019d5 to
cd3e414
Compare
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
* Use dummy wgrads for lower memory consumption Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix to avoid sharing gradients. Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * Disable automatic use of batch_p2p_comm for CP2 Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * Change weight to origin_weight for LN_LINEAR Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: zhongboz <zhongboz@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
* Minor stylistic tweaks and typo fixes Review suggestions from @ptrendx Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix incorrect col strides for MXFP8 matrices Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
d7775fc to
b62d555
Compare
Apply MR comment change. Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: kwyss-nvidia <kwyss@nvidia.com>
8fc753d to
67e790b
Compare
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
6948759 to
ea9e46b
Compare
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
* scaling enum abstract * rm NVTE_ from ScalingMode names * rework scaling mode enum in grouped gemm * fix norm sharding --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
…r op backward (NVIDIA#1646) Explicitly specify quantized tensor usages needed for linear op backward Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Debug checkpointing with te.Sequential Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Xin Yao <yaox12@outlook.com>
Signed-off-by: Xin Yao <yaox12@outlook.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A more reviewable mirror of the changes from NVIDIA#1559